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I.T  abstract  (Maximum  700  words) 

The  principal  purpose  of  this  research  was  to  provide  new  tools  for  measuring 
probabilities  of  failure  free  operation  of  software  systems  and  to  develop  methods 
for  predictions  of  software  reliability.  Links  are  established  between  stochastic 
models  of  fault  occurrence  suggested  by  Scholtz  (1985)  and  Miller  (1986)  and  an 
important  class  of  finite  population  sampling  models  called  "successive  sampling" 
in  the  sample  survey  literature.  Successive  sampling  consists  of  sampling  a  finite 
population  of  objects,  each  with  an  assigned  magnitude,  proportional  to  magnitude 
and  without  replacement.  Recognition  of  linkages  between  Schoz  and  Miller's 
"Exponential  Order  Statistics  Models"  and  successive  sampling  allows  application  of 
an  emerging  body  of  research  on  methods  of  estimation  for  successive  sampling 
models  to  software  reliability  estimation. 
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AFOSR  Contract  #AFOSR-890371  FINAL  REPORT 
SOFTWARE  RELIABILITY:  ESTIMATION  AND  PREDICTION 
Gordon  M.  Katifman 

The  principal  purposes  of  this  resear<di  was  to  provide  new  tools  for 
measuring  probiabilities  of  faHure-free  operation  of  software  systems  and  to 
develop  methods  for  predictions  of  software  reliablity. 

TJT>lr»  are  established  between  stochastic  models  of  fault  occurrence 
suggested  by  Scholz  (1985)  and  Miller  (1986)  and  an  important  class  of  finite 
population  sampling  models  called  '’successive  samplins”  in  the  sample  survey 
literature.  Successive  saxnpling  consists  of  samplinc  a  finite  population  of  objects, 
each  with  an  assigned  magnitude,  proportional  to  magnitude  and  without 
replacement.  Recognition  of  linkages  between  Schoz  and  MUer's  "Exponential 
Order  Statistics  models"  and  successive  sanpling  allows  application  o£  an 
emerging  body  of  research  on  methods  of  estimation  for  successive  san^ling 
models  to  software  reliability  estimation. 

Two  papers  are  devoted  to  extensions  d  the  theory  of  exponential  order 
statistics  models  and  to  presentation  of  methods  of  estimation  based  on  a  data 
record  of  times  to  failures.  A  novel  feature  is  the  development  of  methods  of 
estimation  that  the  distinction  between  types  of  software  failures  (logUit 

coding,  interface,  etc. )  In  particular,  given  a  data  record  of  both  the  type  of 
each  observed  software  failure  and  the  tixne  at  which  it  occurred,  the  question  of 
how  to  estimate  the  number  of  each  fatdt  type  remaining  in  the  system  and  the 
time  on  test  needed  to  discover  some  fraction  of  these  remaining  faults  is 
addressed.  Estimation  me'^ods  studies  are  maximum  likelihood,  conditional 
TTiflvlmTiTn  likelihood  and  unbiased  estimation: 

"Software  Reliability  Modeling  and  Exponential  Order  Statistics"  MIT  Sloan 

School  Working  Paper  3114-90MS,  January  1990  (with  G.  Andreatta)  45  pp. 

"Successive  Sampling  and  Software  Reliahility"  Sloan  School  Working  Paper 

3316,  July  1991  (In  review  with  IEEE  Transactions  on  Software 

Engineering)  29  pp. 

Profile  maTriwMTn  likelihood  methods  for  estimating  remaining  faults  by  type 
in  NASA/Goddard  SEL  software  test  data  were  presented  at  a  TIMS  conference  in 
November  1992.  This  paper  is  in  progress  along  with  a  paper  on  a  Bayesian 
treatment  of  successive  sampling  inference,  entitled  "Bayesian  Successive 
Sampling  Inference" .  An  invited  presentation  on  the  latter  topic  was  presented  at 
the  Latin-American-U.S.  Workshop  on  Bayesian  Statistics  and  Econometrics  in 
Caracas,  Venezuela,  December  9-14,  1992. 

At  termination  of  this  contract,  development  of  efficient  numerical  schemes 
for  solution  of  non-linear  efficient  score  functions  for  profile  maximum  likelihood 
and  work  on.  Bayesian  alternatives  for  estimation  in  light  of  observation  of 
NASA/Goddard  type  data  is  under  way. 


□  SHOW  THAT  EXPONENTIAL  ORDER 

STAirsncs  models  (eos)  =  ss 


□  HOW  ss  ESTIMATION  METHODS  CAN  APPLY 
TO  SOFTWARE  RELIABILITY  TO: 


□  ESTIMATE  RETURNS  TO  TESTING  EFFORT 


□  □ 
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(A)  HOW  MANY  FAULTS  OF  WHAT  TYPE  REMAIN? 


CB)  HOW  MUCH  ADDED  TIME  ON  TEST  IS  NEEDED 
TO  UNCOVER  m  MORE  FAULTS? 


(C)  IFWETESTFOR  T  MORE  UNITS  OF  TIME. 
HOW  MANY  FAULTS  OF  WHAT  TYPES  WILL 
BE  OBSERVED? 


;  ERBS  PROJECT  (NASA  -  GOODDARD  -  SEL) 
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SEL-GODDASD  MODEL 

1.  FAULT  TYPES  AREA  DISTINGUISHABLE 

n.  RELTABILITY  GROWTH  CAPTURED  BY  SS 

n.  THE  #  OF  FAULTS  OF  EACH  TYPE  ARE  SUPER- 
POPULATION  GENERATED 

IV.  THE  SUPER-POPULATION  IS  NON-P  ARAMETRIC 
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SS  ©  SUPERPOPULATION  PROCESS 


I)  GIVEN  An  =  {ax, ....  aN} 

PROB{snlAN}  =  ft 


(n)  ax, aN  ARE  VALUES  OF  N  MOD  RVS 
Ax, ...,  An  with  common  CDF  F(-|fi) 
CONCENTRATED  ON  (0,  ■») 

il 


PROB{Aj  =  a^}  =  0t,  k=l,2,...,  k 
K 

£  ^=1,  0k  >  0. 

k=l 
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ANALYSIS 


(1)  A  NON-PARAMETERIC  PROFILE  ML  ESTIMATE 
OF  PROPORTIONS  OF  EACH  FAULT  TYPE 


(2)  BOUNDS  ON  PARAMETER  ESTIMATES 

(3)  PROFILE  MLE  FOR  NUMBER  OF  FAULTS  BY 
TYPE 
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HOW  MANY  FAULTS  REMAIN? 


PROFILE  MLE  REMAINING  FAULTS* 


1.0  .9  .8  .7  .6 


COMPUTE 
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6 

9 

15 

DATAVAL 
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3 
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16 

INTT 
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1 
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5 

9 

DSfTERE 

0 

1 

3 

4 

6 

USTTERI 

0 

4 

1 

2 

2 

LOGIC 

0 

4 

10 

18 

30 

TOTAL 

REMAINING 

0 

15 

29 

50 

78 

N-n  =  n(i^’ 
_  f  J- 

0 

13 

29 

49 

76 

*FOR  K  =  2  ONLY! 
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SOFTWARE  RELIABILITY  MODELING  AND 
EXPONENTIAL  ORDER  STATISTICS 
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SOFT  WARE  RELIABILITY  MODEUNG  AND 
EXPONENTIAL  ORDER  STATISTICS 

by 

Gio^'anoi  Andteatta  and  Goxdon  M.  Kaufiman* 

ABSTRACT:  Propexties  of  software  failure  times  modelled  as  realizations  of  order  statis¬ 
tics  generated  by  independent  but  non-~identicalIy  distributed  exponential  random  vari¬ 
ables  are  developed.  Edgeworth  and  saddle  point  approximations  to  central  order  statistic 
densities  so  generated  are  developed  using  an  exact  integral  representation  of  these  densi¬ 
ties.  A  comparison  of  Edgeworth  and  saddle  point  approximation  with  exact  densities  for 
two  different  popnlation  types  is  given.  The  accuracy  of  the  saddle  point  ^proocixnation, 
even  for  very  small  population  sires  (N  s  6)  and  small  samples  (a  s  3)  is  exceU^. 

The  same  technique  is  used'to  provide  an  exact  integral  representation  of  the  probabil¬ 
ity  that  a  particular  fault  appears  in  a  s^ple  of  a  given  size.  Some  niuxietical  comparisons 
of  Rosen*$  (1972)  approodmation  of  inclusion  probabilities  with  exact  values  axe  provided. 
His  simple  approximation  appears  to  give  excellent  results  as  well. 

The  intimate  connection  between  successive  sampling  theory  and  EOS  modds  fm 
software  reliabiliQr  is  documented. 

KEY  WORDS:  SOFTWARE  RELIABILITY,  SUCCESSIVE  SAMPLING, 
EDGEWORTH  APPROXIMATION,  SADDLE  POINT 
approximations,  INCLUSION  PROBABILITY, 
ORDER  STATISTICS 


*  Supported  by  AFOSR  Contract  #AFOSR-89-0371 
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1.  INTRODUCTTQM 

Goel  ( 1985)  has  defined  software  rehability  as  the  probabilit j  that  during  a  prespecified 
testing  or  operatioazd  time  iateiTTd,  software  faults  do  not  cause  a  program,  to 

“Let  F  be  a  class  of  faults,  defined  arbitrarily,  and  T  be 
measure  of  relevant  time,  the  units  of  which  are  dictated  by  the 
application  at  hand.  Then  the  reliability  of  the  software  with 
respect  to  the  dass  of  faults  F  and  with  respect  to  the  metric  T,  is 
the  probability  that  no  fault  of  the  class  occurs  dtiring  the 
execution  of  the  program  for  a  prespecrfied  period  of  rdevant  time.” 


Amrmg 

the  most  prominent  art.  modeb  built  on  the  assumptions  tb^t  waiting  tim^c  between  soft> 
ware  &ilures  axe  exponentially  distributed  and  in  addition  are,  conditional  on  knowledge  of 
the  appropriate  parameter  set,  mutually  independent.  Such  models  have  been  called  Expo- 
nential  Order  Statistics  (EOS)  models  by  Miller  (1986)  in  bis  investigation  of  similarities 
of  and  difierences  between  models  based  on  the  aforetnentioned  assumptions.  Littlewood 
(1981)  was  perhaps  the  first  to  challenge  the  assumption  adopted  by  many  authors  that 
each  fault  “...contributes  the  same  amount  to  the  overall  f^ure  rate...”  She  posits  a  model 
in  which  (a)  each  fault  p>ossesses  a  parameto’  (occurrence  rate)  individual  to  that  fiuilt 
and  (b)  the  collection  of  fault  parameters  is  generated  by  a  superpopulation  process.  This 
approach  has  the  decisive  advantage  of  avoiding  some  analytical  and  computational  corn- 
plenties  that  arise  when  assumption  (b)  is  dropped.  It  is  empirical  Bayes  in  ■*^irit  a^H  so 
is  in  formal  correspondence  with  the  Bayesian  approach  to  reliability  modeling  adopted  by 
Singpurwalla  and  his  co-authors  (Langberg  and  Singpurwalla  (1985))  for  example.  How¬ 
ever,  Miller  argues  that  Littlewood’s  m'^del  minus  the  assumption  (b),  a  modd.  that  he  caILt 
a  deterministic  EOS  model,  “...has  a  certain  physical  motivation:  the  individxial  &ilure 
rates  are  physical  quantities  in  the  sense  that  they  can  be  estimated  to  any  desired  degree 
of  accuracy.  The  IDOS  [empirical  Bayes]  and  NHPP  [non-homogeneous  Poisson  pro¬ 
cess]  models  are  attractive  because  of  mathematical  tractibility  and  successful  application 
e:q)eiience;  however,  they  are  more  difficult  to  motivate  and  verify  in  a  physical  sense.” 
(Miller  (1986),  p.  12).  In  sum,  some  researchers  view  the  EOS  model  as  a  first  principles 
model  that  captures  the  physics  of  fault  occurrence  more  accurately  than  the  alternatives 
explored  in  the  literature.  This  led  Miller  (1986)  and  Schdz  (1986)  to  explore  properties  of 
order  statistics  generated  by  mutually  independent  but  nou'-identically  distributed  random 
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variables  —  the  analytical  concommitant  of  the  EOS 

The  connection,  of  this  line  of  research  with  a  sampling  sdieme  well  known  to  sam¬ 
ple  survey  statisticians  —  successive  sampling  or  proportional  to  onr? 

without  replacement  from  a  finite  population  of  magnitudes  -  has  passed  until 

now.  One  of  the  purposes  of  this  paper  is  to  establish  the  nature  of  this  connection.  The 
problem  of  Tnalcing  inferences  about  imobserved  finite  population  parameters  of  the  EOS 
model  based  on  observation  of  waiting  times  between  failures  and  possibly  the  magni¬ 
tude  of  observed  faults  is  a  dual  of  the  problem  of  inference  based  on  observation  of  fault 
magnitudes  alone.  The  later  problem  has  been  investigated  in  detail  several  authors 
(Andreatta  and  Kaufinan  (1986);  Gordon  (1989);  Wang  and  Nair  (1986);  Nair,  and 

Wang  (1989)).  Other  feat\ires  of  the  link  between  software  reliability  models  and  successive 
sampling  appear  in  a  companion  paper  (Kaufman  (1989b)). 

Another  ptnpose  is  to  provide  tools  for  the  computation  of  the  distribution  of  central 
order  statistics  for  the  EOS  model  and  frr  the  probability  that  a  possessing  a  pre¬ 
specified  magnitude  will  be  included  in  a  sample  of  fruits  of  a  given  wiy*  Both  play  an 
important  role  in  theories  of  inference  for  EOS  modds.  The  distribution  of  the  waiting 
time  to  occurrence  of  the  nth  fruit  is  an  analytical  benchmark  for  understanding  properties 
of  the  EOS  model  and  for  a  theory  of  unbiased  estimation  of  the  empirical  distribution 
of  magnitudes  of  unobserved  faults  and  of  the  number  of  fruits  remaining  in  the  software 
system. 

Gordon  (1982)  has  shown  that  the  distribution  of  permutations  of  the  order  in  which 
successively  sampled  elements  of  a  finite  population  are  observed  be  characterized  in 
terms  of  exponential  waiting  times  with  expectations  inversely  proportional  to  magnitudes 
of  the  finite  population  elements.  This  leads  naturally  to  a  corollaxy  interpretation  of  the 
probability  that  a  particular  element  of  the  population  will  be  included  in  a  sample  as  the 
expectation  of  an  exponential  function  of  an  order  statistic  generated  by  independent  but 
non-identically  distributed  exponential  random  variables  (rvs). 

In  Section  3  we  present  an  exact  integral  representation  of  the  marginal  density  of 
an  order  statistic  so  generated.  The  integrand  is  interpretable  as  a  probability  mixture  of 
characteristic  functions  of  svuxis  of  conditionally  independent  Bernoulli  rvs,  an  interpreta¬ 
tion  that  suggests  a  first  approximation  of  the  density,  and  the  form  that  leading  terms  in 
Edgeworth  and  saddle-point  approximations  will  take. 

An  Edgeworth  type  approximation  is  presented  in  Section  4.  While  this  expansion 
could  in  principle  be  derived  by  first  computing  a  saddle-point  approximation  and  then 
using  the  idea  of  recenteiing  a  conjtigate  distribution  as  suggested  by  Daniels  (1954),  we 
have  chosen  to  compute  it  directly. 
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SUCCESSIVE  SAMPLING  AND  SOFTWaBE  RELIABILITY 

hj 

Gordon.  M.  Kaufinan* 

1.  Introduction 

A  software  system  is  tested  and  times  between  failures  axe  observed.  How  maay  faults 
remain  in  the  system?  'What  is  the  waiting  time  to  the  next  dilute?  Tb  the  occurrence 
of  the  next  n  failures?  Conditional  on  the  observed  history  of  the  test  jtfocess,  knowledge 
of  properties  of  the  time  on  test  necessary  to  discover  the  next  n  faults  is  very  useftd  fisr 
making  test  de^gn  decisions: 

Times  between  failures  models  of  software  reliability  are  designed  to  answer  such 
questions.  Many  versions  of  such  models  appear  in  the  literature  on  software  rdiabilify 
and  most  such  models  rdy  on  the  assumption  that  the  &iluie  rate  is  proportional  to  the 
cumber  of  remaining  faults  or  to  some  single  valued  function  of  the  number  of  lesnaining 
faults.  Goel  (1985)  observes  that  this  is  a  reasonable  assumption  if  the  esqperiinental 
deagn  of  the  test  assures  equal  probability  of  executing  aU  portions  of  the  code  -  •>  a 
design  seldom  achieved  in  practice.  The  character  of  testing  usually  varies  with  the  test 
phase:  requirements,  unit,  system  or  operational.  The  impact  of  such  considerations 
have  been  recognized  by  some  authors:  Littlewood's  cxitidsm  of  the  Jdins]d*-Moranda 
assumption  that  software  failure  rate  at  any  point  in  time  is  directly  proportianal  to  the 
residual  number  of  faults  in  the  software  is  dted  by  Langberg  and  Singpurwalla  (1985) 
in  an  excellent  overview  paper.  Only  recently  have  some  tesearchtts  come  to  grips  with 
the  ixiq>lications  of  r^ladng  this  assximption.  hi  terms  of  counts  of  failures,  it  may  he 
labelled  an  “equal  bug  aze”  postulate  (Scholz  (1986).  Littlewood  (1981)  and  Langberg 
and  Singpurwalla  (1985)  do  incorporate  che  assumption  that  different  bugs  may  have 
different  fulure  rates,  but  the  empirical  Bayes  (superpopulation)  approach  adopted  by 
Littlewood  and  the  Bayesian  approach  adopted  by  Singpurwalla  and  Langberg  “averages 
out”  the  effects  of  this  assumption.  According  to  Scholz  “...it  was  not  recognized  by  some 
proponents  of  reliability  growth  models  that  relaxing  the  equal  bug  size  assumption  also 
some  coixq>Hcadons  concerning  the  independence  and  exponentiahty  [of  waiting 

*  Supported  ly  AFOSR  Contract  #AFOSR-€9-0371.  I  wish  to  thank  Nancy  Choi 
Tom  Wright  for  valuable  programming  assistance  and  Chris  Kemerer  for  insightful 
comments. 
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times  between  failures]”.  He  and  MiUer  (1986)  are  the  first  to  investigate  systematically 
(in.  the  absence  of  a  superpopulation  process  or  of  a  Bayesian  prior  for  failure  rates)  the 
implications  of  assuming  that  given  an  observational  history,  each  of  the  remaining  bugs  in 
a  software  system  may  possess  different  probabilities  of  detection  at  a  given  point  in  time. 
In  contrast  to  most  times  between  failures  models,  for  Recessive  sampling  -  EOS  models, 
times  between  failures  are  not  independent.  As  Goel  [1985]  points  out,  independence 
would  be  acceptable  if  ”:;;5UCCCSSiTe  test  eases  WSe  chosen  randbmly.  However,  testing 
especially  functional  testing,  is  not  based  on  independent  test  cases,  so  that  the  test  process 
is  not  likely  to  be  random”. 

Scholz  presents  a  multinomial  model  for  software  reliability  that  is  identical  to  Rosen’s 
diaracterization  of  successive  sampling  stopping  times.  (Rosen,  1972)  The 
seems  to  have  gone  unnoticed.  The  “continuous”  model  based  on  independent,  non- 
identicaJly  distributed  exponential  random  variables  suggested  by  Schok  as  an  approrima- 
tion  to  multinomial  waiting  times  is  in  fact  in  exact  correspondence  with  a  r^>resentation 
of  succes^ve  sampling  in  terms  of  noa-identically  distributed  but  independent 
order  statistics.  Schok’s  approximation  is  in  fact  Ross’s  (1985)  order  statistics 

model  wideb  Ross  treats  Bayedanly.  Gordon  (1983)  was  among  the  first  to  observe  that 
successive  sampling  is  representable  in  this  fashion.  Miller’s  study  of  such  order  statis¬ 
tics  is  focused  on  sinulaxities  and  differences  between  types  of  models  derivable  fttsm  this 
particular  paradigm. 

Joe  (1989)  provides  an  asymptotic  (large  sample)  mayininm  likelihood  theory  for  para¬ 
metric  order  statistics  models  and  non-homogeneous  Poisson  models  of  fault  occurrence 
that,  when  the  parameter  is  of  fixed  dimension,  yields  asymptotic  confidence  intervals. 
He  states  that  for  the  general  exponential  order  statistics  model,  one  cannot  expect  any 
estimate  [of  the  conditional  failure  rate]  to  be  good  because  the  ratio  of  parameters  to 
random  variables  is  too  big”. 

Successive  sampling  as  described  in  the  next  section  has  been  successfully  used  as  a 
model  for  the  evolution.  magnitudes  of  oil  and  gas  field  discovery  and  has  its  roots  in 
the  sample  surv^  literature.  (Hajek  (1981),  for  example.)  In  this  application  magnitudes 
of  fields  in  order  of  discovery  are  observed  and  used  to  make  predictions  of  the  empirical 
faequenaes  of  magnitudes  of  undiscervered  fields.  Lo^cally  tight  theories  of  maximum 
likelihood,  moment  type  and  unbiased  estimation  far  this  class  of  problems  have  been 
developed  by  Bickel,  Nair  and  Wang,  (1992),  Gordon,  (1992)  and  Andr^atta  and  Kaufinan, 
(1986).  The  problem  of  estimation  of  software  reliabOity  based  on  observation  of  times 
between  failures  of  a  software  system  may  be  viewed  as  the  dual  to  the  problem  of  inference 
when  only  magnitudes  of  population  elements  are  observed.  The  principal  purpose  of  this 
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paper  is  to  establish,  connections  between,  these  two  disparate  lines  of  research  and  to  lay  out 
possibilities  for  applying  methods  of  estimation  developed  for  successive  sampli'ng  schemes 
to  successive  sampling  as  a  model  for  software  reliability.  Our  attention  is  restricted  to 
successive  sampling  of  elements  of  a  finite  population  of  software  faults  in  a  software  system; 
that  is, 

(1)  Individual  fatilts  may  possess  distinct  &ilure  rates  that  depend  on  covariates  par¬ 
ticular  to  the  stage  of  testing  and  on  other  features  of  the  software  environment. 
For  a  given  fault,  that  fault’s  failure  rate  as  a  function  of  such  covarUttes  is  railed 
the  fault  magnitude. 

(2)  Faults  axe  sampled  (a)  without  replacement  and  (b)  proportional  to  magnitude. 

Some  recent  studies  of  successive  sampling  sdiemes  have  assumed  the  existence  of  a  su- 
perpopulation  process  generating  finite  population  magnitudes.  We  shall  not. 

The  accuracy  of  model  structure  as  a  depiction  of  the  j^3rsics  of  software  fault  occur¬ 
rence  depends  in  part  on  the  validity  of  chmce  definition  for  the  magnitude  of  a  &ult. 
DiSerent  definitions  of  magnitude  may  be  required  for  dlfierest  environments.  Here  we 
shall  assume  that  the  appropriate  definition  of  the  maguittide  of  a  fault  for  the  particular 
application  considered  has  been  resolved.  It  is  NOT  easy  to  resolve  and  considerable  e&rt 
must  be  devoted  to  defining  operationally  meaningful  definitions  of  fault  magnitudes.  An 
example  will  help  to  clarify  the  meaning  of  “fault  magnitude".  The  Software  FiDgineering 
laboratory  at  NASA-Goddard  has  gathered  detailed  data  from  six  software  projects.  Foe 
some  of  these  projects,  failure  data  blocked  by  test  phase  is  recorded  in  a  form  that  displays 
failures  by  type  as  a  function  of  cumulative  hours  on  test.  Six  distinct  failure  types 
labefled  “compute”,  “dataval”,  “ioit”  “intere",  “interi"  and  “logic"  are  distinguished  in  the 
ERBS  project  acceptance  phase,  for  example.  The  number  of  failures  of  each  type  that 
occurred  within  each,  week  of  ten  weeks  of  acceptance  phase  testing  axe  recorded  along  with 
the  weekly  number  of  time  on  test  hours  expended  by  all  programmers  working  on  this  test 
In  the  coatext  of  the  successive  sampling  model  of  &ult  occurrence  (defined  in  the 
next  section),  each  of  these  six  distinct  feilurc  types  is  assodated  with  a  positive  number; 
let  Oi  be  the  number  assodated  with  failure  type  i  =  1, , . . ,  6.  The  magnitude  a,-  of  fault 
type  *  may  be  interpreted  as  the  reciprocal  of  the  expectation  of  an  exponential  random 
variable  bdon^g  to  each  fault  of  type  t.  In.  turn,  Oj  may  be  made  to  be  a  function 
one  or  more  directly  observed  attributes  that  covaiy  with  the  type  of  fault;  e-g.  eadi  a; 
may  be  a  function  of  programmer  time  necessary  to  fix  faxilts  of  type  t.  Empirical  work  by 
Basili  and  Petricone  (1982)  and  Basili  and  Patniak  (1986)  provides  an  excellent  starting 
print  for  study  of  how  to  define  sudi  covariates  in  an  operationally  meaningful  way.  But 
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this  is  a  subject  for  a  different  paper. 

FoUowiag  a  formal  description  of  successive  sampling  properties  of  succesave  sampling 
schemes  needed  in  the  sequel,  two  distinct  sampling  (observational)  schemes  are 
in  section  three.  The  first  is  i  scheme  in  which  both  the  time  horn  start  of  testing  to 
time  of  occurrence  and  the  magnitude  of  each  &ult  in  a  sample  of  n  faults  are  j<^l7 
observed.  With  this  sdieme  we  can  order  faults  observed  ffom  first  to  last  and  assign  a 
Siraitittg  time”  to  each  fault.  In  the  second  scheme  magnitudes  of  faults  in  a  sample  of 
n  faults  are  observed  along  with  the  waiting  time  to  occurrence  of  the  last  fault  in  the 
sample;  waiting  times  to  occurrences  of  individual  faults  are  not  observed.  The  order  in 
which  &ults  occurred  is  then  lost. 

Section  4  is  devoted  to  properties  of  imbiased  estimators  of  unobserved  finite  popu¬ 
lation  parameters  for  each  of  the  two  aftennentioned  sampling  schema.  Tb»  conoiectioa 

aen  xnanmnm  likelihood  estimation  (MLB)  and  unbiased  estimation  established  by 
Bic  ‘I,  Nair  and  Wang  (1992)  for  a  successive  sampling  scheme  in  whidx  magnitudes  alone 
are  observed  is  developed  for  a  schmie  in  which  both  waiting  times  to  fsaluzes  and  mag¬ 
nitudes  are  observed.  The  results  of  a  Monte  Caxio  study  of  properties  of  both  types  of 
estimators  presented  in  Section  5. 

Section  6  returns  a  principal  interest  of  the  software  manager:  conditional  on  observing 
the  history  of  the  process  up  to  and  including  the  m**  failtire,  what  is  the  waiting  time  to 
the  occurrence  of  the  next  n  —  m  failmes?  Successive  sampling  theory  suggests  a  simple 
point  estimator  of  this  waiting  time,  dependent  on  the  waiting  time  Z(„)  to  occurrence  of 
the  first  m  faults  and  on  the  unordered  set  (jn , . . . ,  }  of  magnitudes  of  feults  observed 
in  (0,Z(to))*  a  Monte  Carlo  study  of  its  behavior  suggests  that  this  dass  of  estimators  of 
returns  to  test  effort  measured  in  faults/unit  time  on  test  is  worth  further  study. 
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